Slovak Web Discussion Corpus
نویسندگان
چکیده
The proposed form and annotations should enable further classical and computational linguistic research of a contemporary way of communication web discussions. Its size should be sufficient for statistical analysis of word connotations, language modeling or document classification, clustering or information retrieval tasks. Future effort will be focused on processing data from social networks. Abstract
منابع مشابه
Slovak National Corpus tools and resources
The article presents current state of affairs in several projects conducted by the Slovak National Corpus department of the L’. Štúr Institute of Linguistics, Slovak Academy of Sciences. We describe the Slovak National Corpus, Corpus of Spoken Slovak, tools used for linguistics analysis and an ongoing effort to create Slovak WordNet. 1 Slovak National Corpus The Slovak National Corpus is a huge...
متن کامل5 th Workshop on Intelligent and Knowledge oriented Technologies
The article presents current state of affairs in several projects conducted by the Slovak National Corpus department of the L’. Štúr Institute of Linguistics, Slovak Academy of Sciences. We describe the Slovak National Corpus, Corpus of Spoken Slovak, tools used for linguistics analysis and an ongoing effort to create Slovak WordNet. 1 Slovak National Corpus The Slovak National Corpus is a huge...
متن کاملAre Web Corpora Inferior? The Case of Czech and Slovak
Our paper describes an experiment aimed to assessment of lexical coverage in web corpora in comparison with the traditional ones for two closely related Slavic languages from the lexicographers’ perspective. The preliminary results show that web corpora should not be considered ―inferior‖, but rather ―different‖.
متن کاملTUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation
This article presents an overview of the existing acoustical corpuses suitable for broadcast news automatic transcription task in the Slovak language. The TUKE-BNews-SK database created in our department was built to support the application development for automatic broadcast news processing and spontaneous speech recognition of the Slovak language. The audio corpus is composed of 479 Slovak TV...
متن کاملOpinion Mining in Conversational Content within Web Discussions and Commentaries
The paper focuses on the problem of opinion classification related to web discussions and commentaries. It introduces various approaches known in this field. It also describes novelty methods, which have been designed for short conversational content processing with emphasis on dynamic analysis. This dynamic analysis is focused mainly on processing of negations and intensifiers within the opini...
متن کامل